Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ACFL23 Instruction Support #425

Open
wants to merge 38 commits into
base: dev
Choose a base branch
from
Open

ACFL23 Instruction Support #425

wants to merge 38 commits into from

Conversation

JosephMoore25
Copy link
Contributor

@JosephMoore25 JosephMoore25 commented Aug 30, 2024

Merging work done towards enabling support for a few codes for ACFL 23, namely STREAM, Minibude, Cloverleaf, Tealeaf, and Minisweep.

This PR is mostly made up of added instruction support. 58 instructions have been added, with 24 unique instructions with the remainder being variants. Most instructions are SVE, with some NEON added.

An additional feature of "infinite loop checking" has been added. This adds a counter in the ROB which throws an error if the same address has been at the head of the ROB for a very long time. This catches a few errors previously found where an erroneous config or broken logic can cause SimEng to get caught in a loop and sometimes eventually hit OOM.

This also fixes an OpenMP bug that has previously popped up for ACFL 23 support, work that Jack had done in a separate branch.

Tests are still being added, and the new group tests need to be added for all instructions. The PR will leave draft stage once all tests have been added.

Here are a list of instructions added:

Opcode Inst Format General Test added? Group Test added?
Opcode::AArch64_UADDLVv8i8v: { // uaddlv hd, vn.8b Yes Yes
Opcode::AArch64_FTSMUL_ZZZ_S: { // ftsmul zd.s, zn.s, zm.s Yes Yes
Opcode::AArch64_FTSMUL_ZZZ_D: { // ftsmul zd.d, zn.d, zm.d Yes Yes
Opcode::AArch64_FTSSEL_ZZZ_S: { // ftssel zd.s, zn.s, zm.s Yes Yes
Opcode::AArch64_FTSSEL_ZZZ_D: { // ftssel zd.d, zn.d, zm.d Yes Yes
Opcode::AArch64_FTMAD_ZZI_D: { // ftmad zd.s, zn.s, zm.s, #imm Yes Yes
Opcode::AArch64_CMEQv2i32rz: { // cmeq vd.2s, vn.2s, #0 Yes Yes
Opcode::AArch64_CMHIv2i32: { // cmhi vd.2s, vn.2s, vm.2s Yes Yes
Opcode::AArch64_CMPHS_PPzZZ_B: { // cmphs pd.b, pg/z, zn.b, zm.b Yes Yes
Opcode::AArch64_CMPHS_PPzZZ_D: { // cmphs pd.d, pg/z, zn.d, zm.d Yes Yes
Opcode::AArch64_CMPHS_PPzZZ_H: { // cmphs pd.h, pg/z, zn.h, zm.h Yes Yes
Opcode::AArch64_CMPHS_PPzZZ_S: { // cmphs pd.s, pg/z, zn.s, zm.s Yes Yes
Opcode::AArch64_CPY_ZPmV_B: { // cpy zd.b, pg/m, vn.b Yes Yes
Opcode::AArch64_CPY_ZPmV_D: { // cpy zd.d, pg/m, vn.d Yes Yes
Opcode::AArch64_CPY_ZPmV_H: { // cpy zd.h, pg/m, vn.h Yes Yes
Opcode::AArch64_CPY_ZPmV_S: { // cpy zd.s, pg/m, vn.s Yes Yes
Opcode::AArch64_FDIVv4f32: { // fdiv vd.4s, vn.4s, vm.4s Yes Yes
Opcode::AArch64_LASTB_VPZ_D: { // lastb dd, pg, zn.d Yes Yes
Opcode::AArch64_LASTB_VPZ_S: { // lastb sd, pg, zn.s Yes Yes
Opcode::AArch64_LASTB_VPZ_H: { // lastb hd, pg, zn.h Yes Yes
Opcode::AArch64_LASTB_VPZ_B: { // lastb bd, pg, zn.b Yes Yes
Opcode::AArch64_CLASTB_VPZ_D: { // clastb dd, pg, dn, zn.d Yes Yes
Opcode::AArch64_CLASTB_VPZ_S: { // clastb sd, pg, sn, zn.s Yes Yes
Opcode::AArch64_CLASTB_VPZ_H: { // clastb hd, pg, hn, zn.h Yes Yes
Opcode::AArch64_CLASTB_VPZ_B: { // clastb bd, pg, bn, zn.b Yes Yes
Opcode::AArch64_LDAXRB: { // ldaxrb wt, [xn] Yes Yes
Opcode::AArch64_LDRSWroW: { // ldrsw xt, [xn, wm, {extend {#amount}}] Yes Yes
Opcode::AArch64_ORNv8i8: { // orn vd.8b, vn.8b, vn.8b Yes Yes
Opcode::AArch64_PFIRST_B: { // pfirst pdn.b, pg, pdn.b Yes Yes
Opcode::AArch64_PNEXT_B: { // pnext pdn.b, pv, pdn.b Yes Yes
Opcode::AArch64_PNEXT_H: { // pnext pdn.h, pv, pdn.h Yes Yes
Opcode::AArch64_PNEXT_S: { // pnext pdn.s, pv, pdn.s Yes Yes
Opcode::AArch64_PNEXT_D: { // pnext pdn.d, pv, pdn.d Yes Yes
Opcode::AArch64_SMAX_ZI_D: { // smax zdn.d, zdn.d, #imm Yes Yes
Opcode::AArch64_SMAX_ZI_H: { // smax zdn.h, zdn.h, #imm Yes Yes
Opcode::AArch64_SMAX_ZI_B: { // smax zdn.b, zdn.b, #imm Yes Yes
Opcode::AArch64_SMAX_ZPmZ_D: { // smax zd.d, pg/m, zn.d, zm.d Yes Yes
Opcode::AArch64_SMAX_ZPmZ_H: { // smax zd.h, pg/m, zn.h, zm.h Yes Yes
Opcode::AArch64_SMAX_ZPmZ_B: { // smax zd.b, pg/m, zn.b, zm.b Yes Yes
Opcode::AArch64_SMINV_VPZ_D: { // sminv sd, pg, zn.d Yes Yes
Opcode::AArch64_SMINV_VPZ_H: { // sminv sd, pg, zn.h Yes Yes
Opcode::AArch64_SMINV_VPZ_B: { // sminv sd, pg, zn.b Yes Yes
Opcode::AArch64_SMIN_ZPmZ_D: { // smin zd.d, pg/m, zn.d, zm.d Yes Yes
Opcode::AArch64_SMIN_ZPmZ_H: { // smin zd.h, pg/m, zn.h, zm.h Yes Yes
Opcode::AArch64_SMIN_ZPmZ_B: { // smin zd.b, pg/m, zn.b, zm.b Yes Yes
Opcode::AArch64_SPLICE_ZPZ_D: { // splice zdn.d, pv, zdn.t, zm.d Yes Yes
Opcode::AArch64_SPLICE_ZPZ_S: { // splice zdn.s, pv, zdn.t, zm.s Yes Yes
Opcode::AArch64_STLXRB: // stlxrb ws, wt, [xn] Yes Yes
Opcode::AArch64_STLXRH: // stlxrh ws, wt, [xn] Yes Yes
Opcode::AArch64_STLXR: // stlxrb ws, {w,x}t, [xn] Yes Yes
Opcode::AArch64_UMAXVv16i8v: { // umaxv bd, vn.16b Yes Yes
Opcode::AArch64_UMAXVv4i16v: { // umaxv hd, vn.4h Yes Yes
Opcode::AArch64_UMAXVv4i32v: { // umaxv sd, vn.4s Yes Yes
Opcode::AArch64_UMAXVv8i16v: { // umaxv hd, vn.8h Yes Yes
Opcode::AArch64_UMAXVv8i8v: { // umaxv bd, vn.8b Yes Yes
Opcode::AArch64_WHILELS_PXX_B: { // whilels pd.b, xn, xm Yes Yes
Opcode::AArch64_WHILELS_PXX_D: { // whilels pd.d, xn, xm Yes Yes
Opcode::AArch64_WHILELS_PXX_H: { // whilels pd.h, xn, xm Yes Yes
Opcode::AArch64_WHILELS_PXX_S: { // whilels pd.s, xn, xm Yes Yes

@JosephMoore25 JosephMoore25 added the 0.9.7 Part of SimEng Release 0.9.7 label Aug 30, 2024
@JosephMoore25 JosephMoore25 self-assigned this Aug 30, 2024
@JosephMoore25 JosephMoore25 marked this pull request as ready for review December 10, 2024 13:35
Copy link
Contributor

@ABenC377 ABenC377 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code all looks good. Though, I'm not sure about the infinite loop checker. If I've missed a discussion about this then please ignore me. But if it is being added as a work around for problems encountered because of erroneous configs or broken logic, shouldn't we be fixing the causes not the symptoms? Erroneous configs are kind of a user error, but we could update the documentation to help them avoid this, and if there is broken logic in SimEng we should be fixing it not plastering over the resulting problem. I suppose I can see the value of this type of check in debug mode to flag a problem to the user if they are running into issues in release, but it seems like unnecessary overhead for Release.

Copy link
Contributor

@FinnWilkinson FinnWilkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most bits look good. Comments mainly about adhearing to the project's style and some confusion on SVE helpers

src/include/simeng/pipeline/ReorderBuffer.hh Outdated Show resolved Hide resolved
src/include/simeng/pipeline/ReorderBuffer.hh Outdated Show resolved Hide resolved
src/lib/arch/aarch64/ExceptionHandler.cc Outdated Show resolved Hide resolved
src/lib/arch/aarch64/ExceptionHandler.cc Outdated Show resolved Hide resolved
src/lib/arch/aarch64/InstructionMetadata.cc Outdated Show resolved Hide resolved
src/include/simeng/arch/aarch64/helpers/sve.hh Outdated Show resolved Hide resolved
src/include/simeng/arch/aarch64/helpers/sve.hh Outdated Show resolved Hide resolved
src/include/simeng/arch/aarch64/helpers/sve.hh Outdated Show resolved Hide resolved
src/lib/arch/aarch64/Instruction_execute.cc Outdated Show resolved Hide resolved
src/lib/arch/aarch64/Instruction_execute.cc Outdated Show resolved Hide resolved
Copy link
Contributor

@FinnWilkinson FinnWilkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looking pretty good - just a few changes needed and some pedantic comments on comments 😅

src/include/simeng/arch/aarch64/helpers/neon.hh Outdated Show resolved Hide resolved
src/include/simeng/arch/aarch64/helpers/neon.hh Outdated Show resolved Hide resolved
src/include/simeng/arch/aarch64/helpers/sve.hh Outdated Show resolved Hide resolved
src/include/simeng/arch/aarch64/helpers/sve.hh Outdated Show resolved Hide resolved
src/include/simeng/arch/aarch64/helpers/sve.hh Outdated Show resolved Hide resolved
src/include/simeng/arch/aarch64/helpers/sve.hh Outdated Show resolved Hide resolved
src/include/simeng/arch/aarch64/helpers/sve.hh Outdated Show resolved Hide resolved
std::cerr << "[SimEng:ReorderBuffer] Infinite loop detected in rob "
"commit at instruction address "
<< std::hex << uop->getInstructionAddress() << std::dec
<< " (" << uop->getMicroOpIndex() << ")." << std::endl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whats the rational for printing the Micro-op index?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may give additional context to the user what exactly is stuck at the head of the ROB if the instruction is uopd. I have updated the comment generally, though we should have a discussion offline on what exactly we want to print out in one of these cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following offline discussion, we agree to keep this message in release mode, but add more detail s.t. the user is aware of why this is being triggered, what's triggering it, and what to do to resolve the issue. The Micro-op index in particular just adds additional verbosity so remains in the message.

@@ -1080,7 +1080,7 @@ TEST_P(Syscall, sched_getaffinity) {
)");
EXPECT_EQ(getGeneralRegister<int64_t>(21), -1);
EXPECT_EQ(getGeneralRegister<int64_t>(22), -1);
EXPECT_EQ(getGeneralRegister<int64_t>(23), 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What has caused this to change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See below

stateChange = {ChangeType::REPLACEMENT, {R0}, {retval}};
stateChange.memoryAddresses.push_back({mask, 1});
uint64_t retval = static_cast<uint64_t>(bitmask);
stateChange = {ChangeType::REPLACEMENT, {R0}, {sizeof(retval)}};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The man page for sched_getaffinity states that the function returns 0 on success and -1 on failure. This seems to be returning the size of a uint64_t which will always be 8. I think this is incorrect.

What I think you have done is update the value being set in memory correctly on 434 (updating the size). But also updated the value returned to the program to be the size also on 433. Depending on the behaviour we want, 433 should be updated potentially in the way it was done previously i.e. set to 0 if pid == 0 and -1 otherwise.

What was the reason for the update?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is worth @jj16791 investigating as it was his find/fix so he will know more than I do on the issue.

The reason given at the time was:

The assert you were triggering was KMP_ASSERT(__kmp_avail_proc == __kmp_topology->get_num_hw_threads());. Newer LLVM OMP runtimes require the affinity mask to be at least 8 bytes in length otherwise it will read the number of available cores out as 0 due to some casting. The affinity mask we were returning was 1 byte in length hence the assert triggered as __kmp_avail_proc was 0. Figured it out from a combination of isolating the instructions run leading up to this assert and then from GodBolt/SimEng figuring out why our mask was being converted to 0 procs available

I've been testing using a STREAM binary (with OpenMP support) compiled with ACFL23. With the current fix, this works. Removing the sizeof on 433 means that this fails. I do agree though that the current implementation doesn't line up with what I'd expect should work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.9.7 Part of SimEng Release 0.9.7
Projects
Status: Changes Requested
Development

Successfully merging this pull request may close these issues.

4 participants